Corpus-based identification and refinement of semantic classes

نویسندگان

  • Adeline Nazarenko
  • Pierre Zweigenbaum
  • Jacques Bouaud
  • Benoit Habert
چکیده

Medical Language Processing (MLP), especially in specific domains, requires fine-grained semantic lexica. We examine whether robust natural language processing tools used on a representative corpus of a domain help in building and refining a semantic categorization. We test this hypothesis with ZELLIG, a corpus analysis tool. The first clusters we obtain are consistent with a model of the domain, as found in the SNOMED nomenclature. They correspond to coarse-grained semantic categories, but isolate as well lexical idiosyncrasies belonging to the clinical sub-language. Moreover, they help categorize additional words.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Concordance-Based Data-Driven Learning Activities and Learning English Phrasal Verbs in EFL Classrooms

In spite of the highly beneficial applications of corpus linguistics in language pedagogy, it has not found its way into mainstream EFL. The major reasons seem to be the teachers’ lack of training and the unavailability of resources, especially computers in language classes. Phrasal verbs have been shown to be a problematic area of learning English as a foreign language due to their semantic op...

متن کامل

Analysis of User query refinement behavior based on semantic features: user log analysis of Ganj database (IranDoc)

Background and Aim: Information systems cannot be well designed or developed without a clear understanding of needs of users, manner of their information seeking and evaluating. This research has been designed to analyze the Ganj (Iranian research institute of science and technology database) users’ query refinement behaviors via log analysis.    Methods: The method of this research is log anal...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

Identification and Distribution of Interactional Contexts in EFL Classes: The Effect of Two Contextual Factors

This study aims at empirically furthering awareness of the organization of interaction in EFL classes. Informed by the methodological framework of conversation analysis, it draws upon a corpus of 52 three-hour naturally-occurring classroom interaction to identify classroom interactional contexts based on the structuring of the pedagogic goals in turn-taking sequences. Conversation analytic proc...

متن کامل

Identification and Extraction of Memes represented as Semantic Networks from Free Text Online Forums

Memes have recently come into vogue in the context of 'viral' transmission of basic information units in online social networks. However, from their original general definition in a sociological context, there is still much work to be done from an information technology viewpoint. This includes such issues as how to process memes from real text corpus, formal definitions for knowledge represent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium

دوره   شماره 

صفحات  -

تاریخ انتشار 1997